We propose a deep neural network fusion architecture for fast and robustpedestrian detection. The proposed network fusion architecture allows forparallel processing of multiple networks for speed. A single shot deepconvolutional network is trained as a object detector to generate all possiblepedestrian candidates of different sizes and occlusions. This network outputs alarge variety of pedestrian candidates to cover the majority of ground-truthpedestrians while also introducing a large number of false positives. Next,multiple deep neural networks are used in parallel for further refinement ofthese pedestrian candidates. We introduce a soft-rejection based network fusionmethod to fuse the soft metrics from all networks together to generate thefinal confidence scores. Our method performs better than existingstate-of-the-arts, especially when detecting small-size and occludedpedestrians. Furthermore, we propose a method for integrating pixel-wisesemantic segmentation network into the network fusion architecture as areinforcement to the pedestrian detector. The approach outperformsstate-of-the-art methods on most protocols on Caltech Pedestrian dataset, withsignificant boosts on several protocols. It is also faster than all othermethods.
展开▼